Supervised, semi-supervised and unsupervised inference of 

gene regulatory networks 

Stefan R. Maetschke 1 , Piyush B. Madhamshettiwar 1 , 
Melissa J. Davis 1 and Mark A. Ragan 1 ' 2 

1 The University of Queensland, Institute for Molecular Bioscience 
2 Australian Research Council Centre of Excellence in Bioinformatics 



Abstract 

Inference of gene regulatory network from expression data is a challenging task. Many 
methods have been developed to this purpose but a comprehensive evaluation that covers un- 
supervised, semi-supervised and supervised methods, and provides guidelines for their prac- 
tical application, is lacking. 

We performed an extensive evaluation of inference methods on simulated expression data. 
The results reveal very low prediction accuracies for unsupervised techniques with the notable 
exception of the z-score method on knock-out data. In all other cases the supervised approach 
achieved the highest accuracies and even in a semi-supervised setting with small numbers of 
only positive samples, outperformed the unsupervised techniques. 
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1 Introduction 



Mapping the topology of gene regulatory networks is a central problem in systems biology. The 
regulatory architecture controlling gene expression also controls consequent cellular behavior such 
as development, differentiation, homeostasis and response to stimuli, while deregulation of thes e 
networks has been implicated in oncogenesis and tumor progression (jPe'er and Hacohenl . 120111 ). 
Experimental methods based e.g. on chromatin immunoprecepitation, DNasel hypersensitivity 
or protein-binding assays are capable of determining the nature of gene regulation in a given 
system, but are time -consuming, expensive and require antibodies for each transcription factor 



( Elnitski et aLk |2006T > . Accurate computational methods to infer gene regulatory networks, par- 



ticularly methods that leverage genome-scale experimental data, are urgently required not only 
to supplement empirical approaches but also, if possible, to explore these data in new, more- 
intcgrative ways. 

Many computational methods have been developed to infer regulatory networks from gene ex- 
pression data, predominately employing unsupervised techniques. Several comparisons have been 
made of network inference methods, but a comprehensive evaluation that covers unsupervised, 
semi-supervised and supervised methods is lacking, and many questions remain open. Here we 
address fundamental questions, including which methods are suitable for what kinds of experi- 
mental data types, and how many samples these methods require. 

Th e most-recent and largest comparison so far has been performed bv lMadhamshettiwar et~al 



(|2012l ). They compared the prediction accuracy of eight unsupervised and one supervised method 
on 38 simulated data sets. The methods showed large differences in prediction accuracy but 
the supervised method was found to perform best, despite the parameters of the unsupervised 
methods having been optimized. Here we extend this study to 17 unsupervised methods and 
include a direct comparison with supervised and semi-supervised methods on a wide range of 
networks and experimental data types (knock-out, knock-down and multi- factorial). 

Another comprehensive evaluation, limited to unsupervised methods, has been performed as 
part of the Dialogue for Reverse En gineering Assessments and Methods (DREAM), an annual open 
competition in netwo r k infe rence (jStolovitzky et all . 12007. 120091 : iMarbach et all . l201ot IPrill et al 



20101 : IMarbach et all 120121) . Results from DREAM highlight that network inference is a chal- 



lenging problem. To quote IPrill et 



( 2010h : "The vast majority of the teams' predictions were 
statistically equivalent to random guesses." However, an important result of the DREAM compe- 
tition is that under certain conditions simple methods can perform well: "...the z-score prediction 
would have pl aced second , first, and hrst (tie) in the 10-node, 50-node, and 100-node subchallenges, 
respectively" (|Prill et adboiol ). 

Unsupervised methods rel y on expression data only but tend to achieve lower prediction accura- 
cies t han supervised methods ( Mordelet and Vert . 20081: Cerulo et al , 2010t Madhamshettiwar et al. 



20121 ). By contrast, supervised methods require information about known interactions for train- 
ing, and this information is typically sparse. Semi-supervised methods reflect a compromise and 
can be trained with much fewer interaction data, but usually are not as accurate predictors as 
supervised method s . On e of the few comparisons with supervised methods was performed by 
iMordelet and Vertl ([20081 ) . They evaluated SIRENE (Supervised Inference of Regulatory Net- 
works) in comparison to CLR, ARA CNE, Relev a nce N etworks (RN) and a Bayesian Network 
on an E. coli benchmark data set by iFaith et all (|2007t ) and found that the supervised method 
co nsiderably outperfor med the unsupervised techniques. 

Cerulo et all (|2010h compared supervised and semi-supervised support vector machines with 



two unsupervised methods and found the former superior. Our evaluation employs similar super- 
vised and semi-supervised methods but includes many more unsupervised methods, distinguishes 
between experimen tal types and p e rform s replicates, resulting in a more-complete picture. A re- 
lated evaluation bv lSchaffter et all ( 20111 ) compared six unsupervised methods on larger networks 
with 100, 200 and 500 nodes and simulated expression data. Again the z-score method was found 
to be one of the top performers in knock-out experiments. 

Several smaller evaluations have been performed but are largely restricted to four unsupervised 
methods (ARACNE, CLR, MRNET and RN) in comparisons with a novel approach on small 
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data sets. The ARACNE method was introduced bv lMargolin et al. ( 2006 ) and showed superior 
precisi on and recall when compared to RN and a Bayesian Network algorithm on simulated net- 
works. iMever et al. I (120071) compared all four unsupervised inference alg orithms on large yeast sub- 
netwo rks (100 up to 1000 nodes) using simulated expression data, a nd lAltav and Emmert-Streib 



(l2010h investigated the bias in the predictions of those algorithms. iFaith et all (|2007l ) evaluated 



CLR, ARACNE, RN and a linear regression m odel on E. co l i inter action data from RegulonDB and 
found CLR to outperform the other methods. lLopes et j2 (|2009h studied the prediction accuracy 
of ARACNE, MRNET, CLR and SFFS+MCE, a feature selection alg orithm, on simulated net- 
works and found the latter superior for networks with small node degree. lHavnes and Bren t (2009) 
developed a synthetic regulatory network generator (GRENDEL) and measured the prediction 
accuracy of ARACN E, CLR, DBmcmc and Symmetric-N for various network sizes and different 
experimental types. IWerhli et all (|2006h compared RN, graphical Gaussian models (GGMs) and 
Bayesian networks (BNs) on the Raf pathway, a small cellular signalling network with 11 proteins, 
and on simulat e d dat a. BNs and GGMs were found to outperform RN on observational data. 
ICamacho et al. (2007) compared Regulatory Strengths Analysis (RSA), Reverse Engineering by 
Multiple Regression (NIR), Partial Correlations (PC) and Dynamic Bayesian Networks (BANJO) 
on a small, simulated network with 10 genes, with differ ent levels of noi s e. In the noise-free scenario 
the PC method showed the highest accuracy. Finally, ICantone et j2 (|2009h constructed a small, 
synthetic, in vivo network of five genes and measured time series and steady-state expression. In 
an evaluation of BANJO, ARACNE and two models based on ordinary differential equations they 
found the latter two to achieve the highest accuracies. iBansal et al. ( 2007ft also evaluated BANJO, 
ARACNE and ordinary differential equations but on random networks and simulated expression 
data. 

In the following sections we first describe the different inference methods in detail, before 
evaluating their prediction accuracies on simulated gene expression data and regulatory networks 
of varying size. We continue with a discussion of the prediction results and conclude with guidelines 
for the use of the evaluated methods. 



2 Methods 



We compared the prediction performance of unsupe rvised, semi-supe r vised and supervised net- 
work inference methods . Following other authors ( Husmeierl . 2003 ; Mordelet and Vert . 2008 
Haynes and Brent , 20091 ) we assess prediction performance by the Area under the Receiver Oper- 
ator Characteristic curve (AUC) 



AUC 



1 ™ 



(1) 



k=l 



where Xk is the false positive rate and is the true positive rate for the k-th output in the ranked 
list of predicted edge weights. An AUC of 1.0 indicates perfect prediction, while an AUC of 0.5 
indicates a performance no better than random guessing. 

Note t hat in contrast to other measures such as Fl score, Matthews correlation, recall or 
precision (|Baldi et all . |2000), AUC does not require choice of a threshold to infer interactions 
from predicted weights; rather, it compares the predicted weights directly to the topology of the 
true network. In the Supplementary Material we nonetheless report results based Fl score and 
Matthews correlation. 

To avoid discrepancies between the gene expression values generated by true networks and 
the actually known, partial networks, we performed our evaluations on simulated, steady-state 
expression data, generated from sub-networks extracted from E. coli and S. cerevisiae networks. 
This allowed us to a ssess the accuracy of an algorithm against a perfectly known true network 
20071 ). When comparing the true with the inferred network, the direction and 



(Bansal et al 



type of interactions were ignored, since many inference methods can infer only the existence of 
an interaction. For the same reason self-interactions were excluded from the network compari- 
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son. We employed GeneNet Weaver (jMarbach et al. . 120091: ISchaffter et ail 1201 lh and SynTReN 



( Van den Bulcke al , 20061 ) to extract sub- networks and to simulate gene expression data 



GeneNetWeaver has been part of several evaluations, most prominently the DREAM chal- 
lenges. The simulator extracts sub-networks from known interaction networks such as those 
of E. coli and S. cerevisiae, emulates transcription and translation, and employs a set of ordi- 
nary differential equations describing chemical kinetics to generate expression data for knock-out, 
knock-down and multi-factorial experiments. 

To simulate knock-out experiments the expression value of each gene is in turn set to zero, 
whereas for knock-down experiments the expression value is halved. In multi-factorial experiments 
the expression levels of a small number of genes is perturbed by a small, random amount. 

SynTReN is a similar but older simulator. Sub-graphs are also extracted from E. coli and S. 
cerevisiae networks but it simulates only the transcription level and multi- factorial experiments. 
However, SynTReN is faster than GeneNetWeaver and allows one to vary the sample number 
independently of the network size. 

To enable a comprehensive and fair comparison we evaluated the prediction accuracies of these 
inference methods on sub-networks with different numbers of nodes (10,...,110) extracted from E. 
coli and S. cerevisiae, and used three experimental data types (knock-out, knock-down, multi- 
factorial) with varying sample set sizes (10,. ..,110)) simulated by GeneNetWeaver and SynTReN. 

We performed no parameter optimization for unsupervised methods, since this would require 
training data (known interactions) and render those methods supervised. For the supervised and 
semi-supervised methods, 5-fold cross-validation was applied and parameters were optimized on 
the training data only. The following sections describe the inference methods in detail. 

2.1 Unsupervised 

This section describes the evaluated unsupervised methods. CLR, ARACNE, MRNE T and MRNE T- 



B are part of the R package "minet" and were called with their default parameters (jMever et al. 



20081 ) . with the exception of ARACNE. With the default parameter e ps = 0.0, ARACNE per 



formed very poorly a nd we used eps = 0.2 instead. Similar ly, GENIE (|Huvnh-Thu et aZ.L l20ldh . 



MINE (iResheAliblll) . and PCIT ^Reverter and Chanl . [2008h were installed and evaluated with de 



fault parameters. All other methods were implemented according to their respective publications. 
SPEARMAN-C, EUCLID and SIGMOID are implementations of our own inference algorithms. 

2.1.1 Correlation 

-based network inference methods assume that correlated expression levels between two genes are 
indicative of a regulatory interaction. Correlation coefficients range from +1 to -1 and a positive 
correlation coefficient indicates an activating interaction, while a negative coefficient indicates an 
inhibitory interaction. The common correlation measure by Pearson is defined as 

1„ -r N cov(Xi,Xj) 

^^ ^d!).;^ - (2) 

where Xi and Xj are the expression levels of genes i and j, cov(-, ■) denotes the covariance, and er(-) 
is the standard deviation. Pearson's correlation measure assumes normally distributed values, an 
assumption that does not necessarily hold for gene-expression data. Therefore rank-based measures 
are frequently employed, with the measures by Spearman and Kendall being the most common. 
Spearman's method is simply Pearson's correlation coefficient for the ranked expression values, 
and Kendall's r coefficient is computed as 

con(Xr XV) - dis(Xr XV) 

r(X i ,X j )= 1 *V ^ " >\ (3) 

-rpiyn — 1) 

where X\ and AJ are the ranked expression profiles of genes i and j. con(-, •) denotes the number 
of concordant and dis(-,-) the number of disconcordant value pairs in X\ and XJ, with both 
profiles being of length n. 
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Since our evaluation of prediction accuracy does not distinguish between inhibiting and acti- 
vating interactions, the predicted interaction weights are computed as the absolute value of the 
correlation coefficients 

Wij = \corr(Xi,Xj)\. (4) 



2.1.2 SPEARMAN-C 

is a modification of Spearman's correlation coefficient where we attempted to favor hub nodes, 
which have many, strong interactions. The correlation coefficient is corrected by multiplying it 
by the mean correlation of gene i with all other genes k, and the absolute value is taken as the 
interaction weight 

1 " 

Wij = \corr(Xi,Xj) ■ - ^ corr(Xj, X k )\, (5) 
n k 

where corr(-, ■) is Spearman's correlation coefficient. 



2.1.3 WGCNA 



stands for Weighted Gene Co-expression Network Analysis (jLangfelder and Horvath . 2008) and is 



a modification of correlation-based inference methods that amplifies high correlation coefficients 
by raising the absolute value to the power of j3 ( "softpower" ) 

wn = \corr(X h X j )f, (6) 

with /3 > 1. Since softpower is a non-linear but monotonic transformation of the correlation coef- 
ficient, the prediction accuracy measured by AUC will be no different from that of the underlying 
correlation method itself. Consequently we show only results for correlation methods but not for 
the WGCNA modification, which would be identical. 



2.1.4 RN 



(relevance networks) by Butte and Kohanel ( 2000l ) measure the mutual information (MI) between 
gene expression profiles to infer interactions. The mutual information / between discrete variables 



Xi and Xj is defined as 



i(Xi,Xj)= p( x i' x j) lo & 



p(Xi , Xj ) 
p(Xi)p(Xj) 



(7) 



where p{xt,Xj) is the joint probability distribution of X, and Xj, and p{xi) and p(Xj) are the 
marginal probabilities. Xi and Xj are required to be discrete variables. We used equal-w idth 
binning for discretization and empirical entropy estimation as described by iMever et all ( 2008 ) . 



2.1.5 CLR 



is the abbreviation for Context Likelihood of Rclatedness (Fai th et all . 1200711 and extends the 



relevance network method (RN) by taking the background distribution of the mutual information 
values I(Xi,Xj) into account. The most probable interactions are those that deviate most from 
the background distribution and for each gene i a maximum z-score Zi is calculated as 



max 

i 



0, 



(8) 



where /i^ and <Tj are the mean value and standard deviation, respectively, of the mutual information 
values I(Xi, Xk), k = 1, ...,n. The interaction between two genes i and j is then defined as 



zf + z]. 



(9) 
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The background correction step aims to reduce the prediction of false interactions based on spu- 
rious correlations and indirect interactions. 



2.1.6 ARACNE 



stands for Algorithm for the Reconstruction of Accurate Cellular Networks ([Margolin et al. I. l2006h 



and is another modification of the relevance network that applies the Data Processing Inequality 
(DPI) to filter out indirect interactions. The DPI states that, if gene i interacts with gene j via 
gene k, then the following inequality holds: 

I(Xi,Xj) < min(/(A 2 ,A 3 ),/(X,,A fe )). (10) 

ARACNE considers all possible triplets of genes (interaction triangles) and computes the mu- 
tual information values for each gene pair within the triplet. Interactions within an interaction 
triangle are assumed to be indirect and are therefore pruned if they violate the DPI beyond a 
specified tolerance threshold eps. We used an threshold of eps = 0.2 for our evaluations. 

2.1.7 PCIT 



is an abbreviation of Partial Correlation and Information Theory ([Reverter and Chad . [2008) and 
is similar to ARACNE. PCIT extracts all possible interaction triangles and applies the DPI to 
filter indirect interactions, but uses partial correlation coefficients instead of mutual information 
as interaction weights. The partial correlation coefficient corrf^ rtlal between two genes i and j 
within an interaction triangle k) is defined as 

corT partiai = corr(Xi,Xj) - corrjXj, X^corrjXj , X k ) 
<0T ij ^{l-corr{X t ,X k )f{l-corr(X 3l X k )f' 



where corr(-, ■) is Person's correlation coefficient. The partial correlation coefficient aims to elim- 
inate the effect of the third gene k on the correlation of genes i and j. 

2.1.8 MRNET 

(jMever et a/.l . l2007l) employs mutual information between expression profiles and a feature selection 



algorithm (MRMR) to infer interactions between genes. More precisely, the method places each 
gene in the role of a target gene j with all other genes V as its regulators. The mutual information 
between the target gene and the regulators is calculated and the Minimum- Redundancy-Maximum- 
Relevance (MRMR) method is applied to select the best subset of regulators. MRMR step-by-step 
builds a set S by selecting the genes % MRMR with the largest mutual information value and the 
smallest redundancy based on the following definition 

i MRMR = argmax( Sl ), (12) 

iev\s 

with Si = Ui — r%. The relevance term u% = I(Xi,Xj) is thereby the mutual information 
between gene i and target j, and the redundancy term r% is defined as 



Interaction weights Wij are finally computed as Wij — max(si, Sj). 



2.1.9 MRNET-B 



is a modification of MRNET that replaces the forward selection strategy to identify the best 
subset of regulator gene s by a backward selection strategy followed by a sequential replace- 
ment {Meyer et all . l2010l ). 



G 



2.1.10 GENIE 



(GEne Network Inference with Ensemble of trees) is similar to MRNET in that it also lets each gene 
take on the role of a target regulated by the remaining genes and then employs a feature selection 
procedure to identify the best subset of regulator genes. In c ontrast to MRNET, Ran dom Forests 
and Extra- Trees are used for regression and feature selection ( Huvnh-Thu et al . 20101) rather than 
mutual information and MRMR. 

2.1.11 SIGMOID 

models the regulation of a gene by a linear combination with soft thresholding. The predicted 
expression value X' ik of gene i at time point k is described by the sum over the weighted expression 
values Xjk of the remaining genes, constrained by a sigmoid function ct(-) 



X^^aQ^X^wa + bi) (14) 
1 + e x 

The regulatory weights my are determined by minimizing the following quadratic error function 
over the predicted expression values X' ik and the observed values Xj&: 

i k 

Finally, the interaction weights w'^ for the undirected network are computed by averaging over 
the forward and backward weights: 



, = K-l + M (17) 

J 2 



2.1.12 MD 

(Mass-Distance) bv lYona et al. ( 20061 ) is a similarity measure for expression profiles. It estimates 
the probability to observe a profile inside the volume delimited by the profiles. The smaller the 
volume, the more similar are the two profiles. Given two expression profiles and Xj, the total 
probability mass of samples whose A;-th feature is bounded between the expression values X^ and 
Xjk is calculated as 

MASS fc (X 4) X i )= freq(x), (18) 

with freq(x) is the empirical frequency. The mass distance MDy is defined as the total volume of 
profiles bounded between the two expression profiles Xi and Xj and is estimated by the product 
over all coordinates k 

n 

MD, 3 = IjMASSfcpQ,^-), (19) 

fc 

with n is the length of the expression profiles. Since the MDy is symmetric and positive we 
interpret it directly as an interaction weight Wij . 

2.1.13 MR 

(mutual rank) bv lObavashi and Kinoshita (|2009h employs ranked Pearson's correlation as a mea- 
sure to describe gene coexpression. For a gene i, first Pearson's correlation with all other genes 
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k is computed and ranked. Then the rank achieved for gene j is taken as score to describe the 
similarity of the gene expression profiles Xi and Xf 

rankij = rank(corr(Xj, Vfc ^ i), (20) 

3 

with corr{-, •) being Pearson's correlation coefficient. The final interaction weight Wij is calculated 
as the geometric average of the ranked correlation between gene i and j and vice versa: 

rankij • ranku 
= 3 — J -. (21) 

2.1.14 MINE 



is a class of Maximal Information- based Nonparametric Exploration statistics by iReshel ( 201 lh ■ 



The Maximal Information Coefficient (MIC) is part of this class and a novel measure to quantify 
non-linear relationships. We computed the MIC for expression profiles Xi and Xj and interpreted 
the MIC score as an interaction weight 

=MLC(Xi,Xj). (22) 

2.1.15 EUCLID 

is a simple method that employs the euclidean distance between the normalized expression profiles 
X[ and Xj of two genes as interaction weights 



= </£Wk-*i*) a ' ( 23 ) 



where profiles are normalized by computing the absolute difference of expression values Xn~ to the 
median expression in profile Xi 



X' ik = \X ik -medxs,n{Xi)\. (24) 



2.1.16 Z-SCORE 



is a network inference strategy by IPrill et all (l2010h that takes advantage of knock-out data. It 



assumes that a knock-out affects directly interacting genes more strongly than others. The z-score 
Zij describes the effect of a knock-out of gene i in the fc-th experiment on gene j as the normalized 
deviation of the expression level Xjk of gene j for experiment k from the average expression fJ.(Xj) 
of gene j : 

^ = \ Xjk jff j) l (25) 
a{Xj) 

The original Z-SCORE methods requires knowledge of the knock-out experiment k and is therefore 
not directly applicable to data from multi-factorial experiments. The method, however, can easily 
be generalized by assuming that the minimum expression value within a profile indicates the 
knock-out experiment (min(Xj) — Xjk). Equation [25] then becomes 

- r n( ^~ MXj) i, (26) 

and the method can be applied to knock-out, knock-down and multi-factorial data. Note that 
is an asymmetric score and we therefore take the maximum of and Zji to compute the final 
interaction weight Wij as 

Wij = max(zij,Zji). (27) 
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2.2 Supervised 

A great variety of different supervised machine learning methods has been developed. We limit 
our evaluation to Support Vector Machines ( SVMs) because they have been successfully applied 
to the inference of gene regul atory networks dMordelet and Vertl . [2008) and can easily be trained 
in a semi-supervise d setting ( Cerulo et all l2010h . We used the SVM implementation SVMLight 



bv I Joachims! (fl999) for all evaluations. 



SVMs are trained by maximizing a constrained, quadratic optimization problem over Lagrange 
multipliers a: 

N - N 

max L(a) = Y^a, - - n,n , //,//, x, ' x y 

a * — * 2 z — ' 

1=1 w=1 (28) 

Ei=l a iVi = 



subject to 



< at < C for Mi. 



The labels yi determine the class to which feature vector Xf belongs and C is the so-called 
complexity parameter that needs to be tuned for optimal prediction performance. Once the optimal 
Lagrange multipliers a are found, a feature vector can be classified by its signed distance d(x) to 
the decision boundary, which is computed as 

N 

d(x) = a *y* x * Tx + b - ( 29 ) 

i=l 

The distance <i(x) can be interpreted as a confidence value. The larger the absolute distance, the 
more confident the prediction, and similar to a correlation value we interpret the distance as an 
interaction weight. 

In contrast to unsupervised methods, e.g. correlation methods, the supervised approach does 
not directly operate on pairs of expression profiles but on feature vectors that can be constructed 
in various ways. We computed the outer product of two gene expression profiles Xi and Xj to 
construct feature vectors: 

x = X t Xf. (30) 

The outer product was chosen because it is commutative, and predicted interactions are therefore 
symmetric and undirected. A sample set for the training of the SVM is then composed of feature 
vectors x^ that are labeled yi — +1 for gene pairs that interact and yi = — 1 for those that do not 
interact. 

If all gene pairs are labeled, all network interactions would be known and prediction would be 
unnecessary. In practice and for evaluation purposes training is therefore performed on a set of 
labeled samples, and predictions are generated for the samples of a test set. Figure [T] depicts the 
concept. All samples within the training set are labeled and all remaining gene pairs serve as test 
samples. 

Note that the term "sample" in the context of supervised learning refers to a feature vector 
derived from a, pair of genes and their expression profiles, whereas a sample in an expression data 
set refers to the gene expression values for a single experiment, e.g. a gene knock-out. 

We evaluate the prediction accuracy of the supervised method by generating labeled feature 
vectors for all gene pairs (samples) of a network. This entire sample set is then divided in to five 
parts. Each of the parts is used as a test set and the remaining four parts serve as a training set. 
The total prediction accuracy is averaged over the prediction accuracies achieved during the five 
iterations (five-fold cross-validation). 

2.3 Semi-supervised 

Data describing regulatory networks are sparse and typically only a small fraction of the true 
interactions is known. The situation is even worse for negative data (non-interactions), since 
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Figure 1: Extraction of samples for the training and test set from a gene interaction network. 



experimental validation largely aims to detect but not exclude interactions. The case that all 
samples within a training data set can be labeled as positive or negative is therefore rarely given 
for practical network inference problems and supervised methods are limited to very small training 
data sets, which negatively affects their performance. 

Semi-supervised methods strive to take advantage of the unlabeled samples within a training set 
by taking the distribution of unlabeled samples into account, and can even be trained on positively 
labeled data only. Figure [5] shows the required labeling of data for the different approaches. 
Supervised methods require all samples within the training set to be labeled, while unsupervised 
methods require no labeling at all. Semi-supervised approaches can be distinguished into methods 
that need positive and negative samples and methods that operate on positive samples only. 



supervised 



unsupervised 



-o o 



network 




semi-unsupervised 
+ 
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m—m • 
m + • • 



o 

]! 
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Figure 2: Original labeling of samples for supervised, unsupervised, semi-supervised and positives- 
only prediction methods. All the six samples within a sample set are generated by a four-node 
network with three interactions. 



The semi-supervised method used in this evaluation is based on the supervised SVM approach 
described above. The only difference is in the labeling of the training set. In the semi-supervised 
setting only a portion of the training samples is labeled. To enable the SVM training, which 
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requires all samples to be labeled, all unlabele d samples within the semi-supervised training data 
are relabeled as negatives ( Cerulo et al\ . |20 ldT) . This approach enables a direct comparison of the 
same prediction algorithm trained with fully or partially labeled data. 

We assigned different percentages (10%. ..,100%) of true positive and negative or positive- 
only labels to the training set. The prediction performance of the different approaches was then 
evaluated by five-fold cross-validation, with equal training/test set sizes for the supervised, semi- 
supervised, positives-only and unsupervised methods compared. 



3 Results 

In the following we first evaluate the prediction accuracy of unsupervised methods before compar- 
ing two selected unsupervised methods with supervised and semi-supervised approaches. 

3.1 Unsupervised methods 

Figure El shows the prediction accuracies measured by AUC for all unsupervised methods for three 
different experimental types (knock-out, knock-down and multi- factorial) and the average AUC 
(all) over the three types. Networks with 10, 30, 50, 70, 90 and 110 nodes were extracted from E. 
coli and S. cerevisiae and expression data were simulated with GenNetWeaver, with the number 
of samples (experiments) equal to the nodes of the network. Every evaluation was repeated 10 
times, so each bar therefore represents an AUC averaged over 60 networks or 180 networks (all). 

Most obvious are the large standard deviations in prediction accuracy across all methods and 
experimental types. For small networks the accuracy of a method can easily vary between no better 
than guessing to close to perfect (see Supplementary Material). While most differences between 
methods are statistically significant (p- values < 0.01 for Wilcoxon rank sum test with Bonferroni 
correction), differences are largely small and the ranking for most methods is therefore not stable 
and depends on the experimental data type, the source network, the sub-network size and other 
factors (see Supplementary Material). However, a simple Pearson's correlation is consistently the 
second-best performer for all experimental types. 

Interestingly, rank-based correlation methods (SPEARMAN, KENDALL) that are very similar 
to Pearson correlation perform very poorly on knock-out and knock-down data but well for multi- 
factorial experiments. 

With the exception of the Z-SCORE method prediction, accuracies are very low in general. 
Z-SCORE was specifically designed for knock-out data and indeed clearly outperforms all other 
methods for this experimental type, despite its simplicity. It is the only unsupervised method that 
achieves a good prediction accuracy (AUC = 0.9). 

3.2 Network size 

Figure [3] summarizes results averaged over networks. We also examined how the network size 
impacts the prediction performance of the various methods. The heat map in Figure U is based on 
the same data as Figure [31 but shows the prediction accuracies (AUC) of the inference methods 
on multi-factorial data for networks with different numbers of nodes (sec Supplementary Material 
for the related figures on knock-out and knock-down data). 

The rows in Figure |4] are ordered according to mean AUC and the ranking is therefore identical 
to that in the multi- factorial bar graph in Figure [3] Top performers on average are the correlation 
methods by Pearson, Spearman and Kendall, with the corrected Spearman method (SPEARMAN- 
C) achieving the highest mean AUC. However, when focusing on networks of specific size, the best 
performance is achieved by the EUCLID method for small networks with 10 nodes. Other methods 
also show different behaviors with respect to network size. Correlation methods clearly achieve 
higher AUCs for large networks. Similar trends can be observed for MR, MINE, GENIE, MRNET, 
MRNET-B and CLR. In contrast, SIGMOID, PCIT and MD decrease in prediction accuracy for 
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Figure 3: Prediction accuracy (AUC) of unsupervised methods on multi-factorial, knock-out, 
knock-down and averaged (all) data generated by GenNet Weaver. 10 repeats over networks with 
10,...,110 nodes, extracted from E. coli and 5. cerevisiae. Error bars show standard deviation. 



growing network sizes, while the performance of RN and ARACNE is seemingly unaffected by 
network size within the investigated size range. 

3.3 Sample number 

Apart from the size of the network, we also expected the number of samples to have an effect on 
the prediction accuracy of the inference algorithms. GenNet Weaver generates gene expression pro- 
files with the same number of samples as network nodes (genes). We therefore used SynTReN to 
vary networks size and sample number independently. The heat map in Figure [5] shows prediction 
accuracy (AUC) averaged over all inference methods for different network sizes and sample num- 
bers. SynTReN simulates expression data for multi-factorial experiments only, and networks were 
extracted from E. coli. All experiments were repeated 10 times. The results show the expected 
trend of improving accuracy with increasing number of samples and decreasing size of network. 

However, the absolute improvements in prediction accuracy are rather small with additional 
data, most likely because unsupervised methods can infer only simple network topologies reliably 
and small sample sets are sufficient for this purpose. For instance, networks with 50 nodes are 
predicted with an AUC of roughly 0.65, when 50 samples are available. Increasing the sample set 
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Figure 4: Prediction accuracy (AUC) of unsupervised methods on multi-factorial data for different 
network sizes. Data generated by GenNetWeaver and extracted from E. coli and S. cerevisiae. 



size to 110 raises the prediction accuracy only to an AUC of around 0.67. 



3.4 Supervised methods 

Finally, we wanted to compare unsupervised with supervised and semi-supervised approaches. 
Because of the time-consuming training required for supervised methods we limited our evaluation 
to networks with 30 nodes extracted from E. coli networks. Expression profiles were generated 
with GenNetWeaver, and each experiment was repeated 10 times. 

Figure [6] shows the prediction accuracies (AUC) for supervised and semi-supervised methods 
for three different experimental types (knock-out, knock-down and multi- factorial) and the aver- 
age AUC (all) data. For direct comparison, we included two unsupervised methods (Z-SCORE, 
SPEARMAN) in our evaluation of supervised methods. Supervised and semi-supervised methods 
are labeled "SVM" followed by the percentage of labeled data (10%, 30%, 50%, 70%, 90%, 100%). 
The suffix "+" indicates that only positive data were used and "±" indicates that postive and neg- 
ative data were used. For instance, "SVM-70±" describes an SVM trained on 70% of labeled data 
(positive and negative) . All evaluations are five- fold cross- validated and the complexity parameter 
C of the SVM was optimized via grid search (0.1 .. . 100) for each training fold. 

The results show good prediction accuracies for supervised methods on all experimental types, 
with a slight advantage for knock-out data. As expected, performance increases with the per- 
centage of data labeled but there is little difference between labeling only positive data, or both 
positive and negative data. Apparently, supervised methods can be trained effectively even when 
only a portion of network interactions (positives) is known. 

Even with as little as 10% of known interactions, semi-supervised methods still outperform 
unsupervised methods for multi-factorial data. The Z-SCORE method is still the top-performing 
method on knock-out data, but supervised methods are not far behind and considerably outper- 
form Spearman's correlation. For knock-down data the Z-SCORE method loses its top rank, and 



13 




Figure 5: Prediction accuracy (AUC) averaged over all unsupervised methods on multi-factorial for 
different network sizes (nodes) and sample numbers. Data generated by SynTReN and extracted 
from E. coli. 10 repeats. 

semi-supervised methods perform better when at least 70% of the data are labeled. 

To summarize, apart from the Z-SCORE method on knock-out data, supervised and semi- 
supervised approaches considerably outperform unsupervised methods and achieve good prediction 
accuracies in general for networks of this size. 



4 Discussion 



4.1 Simulated data 

While simulators such as Gen Net Weaver generate e xpression data that are in good agreement 
with biological measurements (M arbach et al. . 2010f ) they remain incomplete models, e.g. post- 
transcriptional regulation and chromatin states are missing, and an evaluation of inference methods 
on real data would clearly be preferable. However, currently known network structures, even for 
well-characterized organisms , are fragmentary and o nly partially correct representations of the 



interactions between genes ( Stolovitzkv et al. . 20071) . Consequently, there is an unknown but 



probably large discrepancy between the expression data measured and the observed part of the 
actual network that generates them, rendering assessment of inference methods on observed gene 
regulatory networks and their expression values very difficult. We therefore have limited our 
evaluation to in silico benchmarks, but met hods that fail for sim ulated data are unlikely to succeed 
in the inference of real biological networks ([Bansal et all I2007Q . 



4.2 Linear SVMs 

Another limitation of our study is the choice of linear SVMs for the evaluation of supervised 
and semi-supervised methods. We prefer linear SVMs over more-powerful non-linear methods for 
two reasons. Firstly, linear SVMs are considerably faster to train and have fewer parameters to 
optimize than non-linear SVMs - a significant advantage in a comprehensive study. Secondly, 
identifying a complex system with many variables (interaction weights) from a small number of 
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Figure 6: Prediction accuracy (AUC) of supervised methods on multi-factorial, knock-out, knock- 
down and averaged (all) data generated by GenNet Weaver, five-fold cross-validation and 10 re- 
peats over networks with 30 nodes, extracted from E. coli. Error bars show standard deviation. 



samples calls for a simple predictor. We also tried to evaluate transductive SVMs (|joachimd . l2009h 
but found them very time-consuming to train, and they achieved accuracies considerably lower 
than the semi-supervised SVMs (data not shown). We therefore did not perform a full evaluation 
and do not report results for transductive SVMs. 



4.3 Feature vectors 

We co nstruct feature vect ors by computing the outer product of the expression profiles of two 
genes. Cerulo et al. ( 2010f ) constructed feature vectors by concatenating the two expression pro- 



files. The outer product results in larger feature vectors (TV 2 vs. 27V) but is independent of the 
order of the gene pair. The training set is therefore half the size compared with the concate- 
nation approach (n (n — 1)) and we achieved higher prediction accuracies with the linear SVM. 
Cerulo et al. (2010), however, used non-linear SVMs (RBF) that might achieve the same or bet- 
ter accuracies on concatenated feature vectors but are more time-consuming to train and require 
two parameter (C, 7) to be optimized. It therefore remains an open question, which method is 
preferable. 

SIRENE bv lMordelet and Vert ( 2008 ) takes a different approach, with SVMs trained on feature 
vectors derived from single profiles. However, it requires knowledge about the transcription factors 
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amongst the genes, and cannot predict interactions between target genes. Since each transcription 
factor is assigned a separate SVM, feature vectors are of length N and the training set has only 
n samples, the individual SVMs can be trained very efficiently, but training time is multiplied by 
the number of transcription factors. 



4.4 Unbalanced data sets 

Gene regulatory networks tend to be sparse, with the number of positive samples (interactions) 
typically much smaller than the number of negative samples (non-interactions). Consequently 
data sets for the training of supervised methods are heavily unbalanced, and this could have a 
negative impact on the prediction accuracy of the classifier. We therefore tried to weight positive 
and negative samples inversely to their ratio, but did not observe any improvements in prediction 
accuracy (data not shown). All evaluations in this paper were therefore performed with equally 
weighted (w = 1) samples. 



4.5 Network inference 

The evaluation results reveal large variations in prediction accuracies across all methods. Non- 
linear methods such as MINE do not perform better than linear Pearson's correlation and in 
general, we find that complex methods are no more accurate than simple methods. The Z-SCORE 
method and Pearson correlation are the two best-performing unsupervised methods. 

A detailed analysis revealed that unsupervised approaches work well for simple network topolo- 
gies (e.g. star topology) and networks with exclusively activating or inhibiting interactions, but 
fail for more-complex cases (see Supplemenary Material). Mixed regulatory interactions constitute 
a fundamental problem for unsupervised network inference as depicted in Figure [7] 

AAA A 

® ® © 

© 

A 

Figure 7: Gene A inhibits gene D and gene B activates gene D. The resulting expression profile 
of gene D is, however, most similar to that of gene C, which does not regulate gene D. 



Let gene A inhibit gene D but let gene B activate the same gene D. Given the expression 
profiles of genes A and B as shown in Figure and assuming identical interaction weights but 
with opposite signs, the profile for gene D, resulting from a linear combination, is most similar to 
that of gene C and very different from A or B. Consequently, the most-appropriate but erroneous 
conclusion is to infer a regulatory relationship between C and D. Without any further information 
(e.g. knock-outs, existing interactions) any method that infe rs interactions f r om th e similarity of 
expression profiles alone is prone to fail in this common case. Schaffter et al. ( 201l[) identify other 
comm on network motifs and the methods that tend to infer them incorrectly. and lKrishnan et all 
(|2007j ) show that networks of a certain complexity cannot be reverse-engineered from expression 
data alone. 
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5 Conclusion 



Perhaps the most- important observation from thi s evaluation is the l arge v ariance in prediction 
accuracies across all methods. In agreement with Havnes and Brent] ( 20091 ) we find that a large 



number of replicates on networks of varying size is required for reliable estimates of the prediction 
accuracy of a method. Evaluations on single data sets - especially on real data - are unsuitable 
to establish differences in the prediction accuracy of inference methods. 

On average, unsupervised methods achieve very low prediction accuracies, with the notable 
exception of the Z-SCORE method, and are considerably outperformed by supervised and semi- 
supervised methods. Simple correlation methods such as Pearson correlation are as accurate as 
much more-complex methods, yet much faster and parameter less. Unsupervised methods are 
appropriate for the inference only of simple networks that are entirely composed of inhibitory or 
activating interactions but not both. 

The Z-SCORE method achieved the best prediction accuracy of all methods on knock-out data, 
but has obvious limitations. For instance, the method fails when a gene is regulated by an or- 
junction of two other genes. However, the method could easily be generalized to multi-knock-out 
experiments. 

On multi-factorial data the supervised and semi-supervised methods achieved the highest ac- 
curacies; even with as few as 10% of known interactions, the semi-supervised methods still out- 
performed all unsupervised approaches. There was little difference in prediction accuracy for 
semi-supervised methods trained on positively labeled data only, compared to training on positive 
and negative samples. Apparently semi-supervised methods can effectively be trained on partial 
interaction data and non-interaction data are not essential. 

These results have important implications for the application of network inference methods 
in systems biology. Even the best methods are accurate only for small networks of relatively 
simple topology, which means that large-scale or genome-scale regulatory network inference from 
expression data alone is currently not feasible. If inference methods are to be applied to data of 
the scale generated by modern microarray platforms, a feature selection step is usually required to 
reduce the size of the inference problem; attempts to apply network inference to such large-scale 
datasets may be premature, and consideration should be given to focusing the biological question 
to use smaller-scale, higher-quality experimental data. 

Our analysis also indicates that certain kinds of biological data are more amenable for accurate 
network inference than others. Most microarray datasets are most similar to our multi-factorial 
simulations, which yielded poorly inferred networks with unsupervised methods. Increasing the 
number of samples in the experiment (a common strategy to improve inference) does not in fact 
generate the hoped-for improvements. More useful are knock-out data, which our simulations show 
contain more-useful information, and support higher-quality inference. Biologists who wish to gain 
insight into regulatory architecture should consider these limitations when designing experiments. 

To summarize, small networks (as evaluated here) can be inferred with high accuracy (AUC 
Rt 0.9) even with small numbers of samples using supervised techniques or the Z-SCORE method. 
However, even with the best-performing methods large variations in prediction accuracy remain, 
and predictions may be limited to undirected networks without self-interactions. 
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1 Unsupervised 



This sections contains additional data of unsupervised methods for different performance metrics 
and experimental data types. 



1.1 Methods 

The following three figures show the prediction performance of unsupervised methods for three 
different performance measures such as the Area Under the ROC curve (AUC), Matthew's Cor- 
relation Coefficient (MCC) and the Fl-score. The threshold for the MCC and Fl score metrics 
were optimized. The AUC does not have a threshold that requires optimization. 

All methods were evaluated on multi- factorial, knock-out, knock-down and averaged (all) data 
generated by GenNet Weaver. Each evaluation was repeated 10 times over networks with 10,..., 110 
nodes, extracted from E. coli and S. cerevisiae networks. 
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Figure 1: Prediction accuracy (AUC) of unsupervised methods for different experimental data 
types. Error bars show standard deviation. 
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While there are slight differences in the ranking of the methods depending on the chosen 
performance metric no dramatic shifts can be observed. Z-SCORE and PEARSON remain the 
best performing methods in all cases and the Z-SCORE method dominates all other methods for 
knock-out data. 
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Figure 2: Prediction accuracy (MCC) of unsupervised methods for different experimental data 
types. Error bars show standard deviation. 
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1.2 Network size 

This section shows the prediction performance (AUC) of the unsupervised methods for networks 
with different node numbers and for the three experimental types (multi-factorial, knock-down, 
knock-out). All expression data were simulated with GenNet Weaver and sub- networks were ex- 
tracted from E. coli. 
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Figure 4: Prediction accuracy (AUC) of unsupervised methods on multi-factorial data for different 
network sizes. 

Figure S] reveals that the best performing unsupervised method on multi-factorial data is the 
EUCLID method but only on very small networks with 10 to 30 nodes. Correlation based methods 
such as PEARSON, SPEARMAN-C, SPEARMAN, KENDALL and some other methods show 
better performance on larger networks (90 and 110 nodes) than on smaller networks. 
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Figure 5: Prediction accuracy (AUC) of unsupervised methods on knock-out data for different 
network sizes. 



On knock-out data the most accurate method is the Z-SCORE method. While the prediction 
accuracy of the Z-SCORE method decreases with network size it still clearly outperforms all other 
methods for networks of all sizes (see Figure [5]). There is a general trend for most methods 
to perform better on the small 10-node network. Apart from PEARSON, all correlation based 
methods (SPEARMAN-C, SPEARMAN, KENDALL, MINE) achieve very low AUCs on knock-out 
data. 
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Figure 6: Prediction accuracy (AUC) of unsupervised methods on knock-down data for different 
network sizes. 

The results on the knock-down data shown in Figure |6] are similar to the results on the knock- 
out data (see Figure [5]). The Z-SCORE method remains the best performing method. The large 
majority of methods perform best on the small 10-node network - especially the EUCLID method, 
which was the best performer on networks of this size for multi-factorial data. 
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1.3 Network predictions 

All evaluation showed large variations in the prediction accuracy of the methods. Even for very 
small networks with only 10 nodes the prediction accuracy can vary from perfect to completely 
wrong. To better understand the reasons causing the large variances we visualized the networks 
(out of 100) that were predicted with the highest and lowest accuracy, using Spearman's correlation 
as a network inference method and the AUC as performance metric. Sub-networks with 10 nodes 
were extracted from the E. coli network and expression data were simulated with GenNetWeaver. 
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Figure 7: True network where Spearman's correlation failed to recover the topology (AUC = 
0.508). Green means activating, and red means inhibiting interactions 

Figure [7] shows a true network where Spearman's correlation failed to recover the topology 
(AUC = 0.508). Note that some interactions are activating (green) and some interactions are 
inhibiting (red), which results in a more complex dynamic of the network than a network with 
exclusively activating or inhibiting interactions. In contrast, Figure [8] shows the true network 
where Spearman's correlation inferred the network topology close to perfect (AUC = 0.971). 
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Figure 8: True network where Spearman's correlation recovered the topology accurately (AUC = 
0.971). The network has only activating (green) interactions. 

In general, networks with exclusively activating or inhibiting interactions and simple topologies 
(e.g. star topology) can be inferred accurately with unsupervised methods, even on multi-factorial 
data. However, networks with complex topologies or a mix of activating and inhibiting interactions 
typically cannot be recovered reliably from multi-factorial data. 
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2 Supervised 



This section compares supervised, semi-supervised and unsupervised methods, using three differ- 
ent performance metrics such as the Area Under the ROC curve (AUC), Matthew's Correiation 
Coefficient (MCC) and the Fl-score. 

All methods were evaluated on multi- factorial, knock-out, knock-down and averaged (all) data 
generated by GenNet Weaver. 5-fold cross-validation was used and each evaluation was repeated 
10 times over networks with 30 nodes, extracted from E. coli. 

The results show little difference in the ranking of the methods for different performance met- 
rics. The Z-SCORE method achieves the highest accuracies on the knock-out data but performs 
worst on multi-factorial data. SPEARMAN typically shows the lowest prediction accuracy and 
semi-supervised methods are effectively ranked according to the percentage of labeled data used. 
No distinction between semi-supervised methods trained on positives and negatives and methods 
trained on positives-only can be observed. 
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Figure 9: Prediction accuracy (AUC) of supervised methods on multi-factorial, knock-out, knock- 
down and averaged (all) data generated by GenNetWeaver. Error bars show standard deviation. 
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Z-SCORE 
SPEARMAN 
SVM-10 + 
SVM-10± 
SVM-30 + 
SVM-30± 
SVM-50 + 
SVM-70 + 
SVM-50± 
SVM-70± 
SVM-90 ± 
SVM-90 + 
SVM-100 ± 
SVM-100 + 

SPEARMAN 
SVM-10 + 
SVM-10± 
Z-SCORE 
SVM-30 + 
SVM-30± 
SVM-50 + 
SVM-50± 
SVM-70 + 
SVM-90 ± 
SVM-70± 
SVM-90 + 
SVM-100 ± 
SVM-100 + 




SPEARMAN 
SVM-10 
SVM-10± 
SVM-30 
Z-SCORE 
SVM-30 ± 
SVM-50 
SVM-50± 
SVM-70 
SVM-70± 
SVM-90 ± 
SVM-90 
SVM-100 ± 
SVM-100 



SPEARMAN 
SVM-10 ± 
SVM-10 
SVM-30 
SVM-30 ± 
SVM-50 
SVM-70 
SVM-50 ± 
SVM-70 ± 
SVM-100 ± 
SVM-100 
SVM-90 
SVM-90 ± 
Z-SCORE 
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SPEARMAN 
Z-SCORE 
SVM-10 + 
SVM-30 + 
SVM-10± 
SVM-70 + 
SVM-50 + 
SVM-30± 
SVM-90 + 
SVM-100± 
SVM-100 + 
SVM-50± 
SVM-90± 
SVM-70± 



SPEARMAN 
SVM-10 + 
SVM-30 + 
SVM-50 + 
SVM-10± 
Z-SCORE 
SVM-30± 
SVM-70 + 
SVM-50± 
SVM-70± 
SVM-90 + 
SVM-90± 
SVM-100± 
SVM-100 + 




